Data Visualization#

Once upon a time there were plots upon plots upon plots.

Load data#

Hide code cell source
import pandas as pd
import sys
sys.path.append('../')
from source.bokeh_plots import *
from source.data_visualization import *
output_notebook()

file_path = '../data/al_atlas_main_results.xlsx'
model_name = 'AML Epigenomic Risk'

# Read the data
df = pd.read_excel(file_path, index_col=0).sort_index()

# Define train and test samples
df_train = df[df['Train-Test']=='Train Sample']
df_test = df[df['Train-Test'] == 'Test Sample']

# Drop the samples with missing labels for the selected column
df_px = df_train[~df_train['Vital Status'].isna()]

# drop the samples with missing labels for the ELN AML 2022 Diagnosis
df_dx = df_train[~df_train['WHO 2022 Diagnosis'].isna()]

# exclude the classes with fewer than 10 samples
df_dx = df_dx[~df_dx['WHO 2022 Diagnosis'].isin([
                                       'MPAL with t(v;11q23.3)/KMT2A-r',
                                       'B-ALL with hypodiploidy',
                                       'AML with t(16;21); FUS::ERG',
                                       'AML with t(9;22); BCR::ABL1'
                                       ])]

### Select samples from COG AAML1031, 0531, and 03P1 Dx samples
df_cog = df[df['Clinical Trial'].isin(['AAML0531', 'AAML1031', 'AAML03P1'])]
df_cog = df_cog[df_cog['Sample Type'].isin(['Diagnosis', 'Primary Blood Derived Cancer - Bone Marrow',
                                            'Primary Blood Derived Cancer - Peripheral Blood'])]
df_cog = df_cog[~df_cog['Patient_ID'].duplicated(keep='last')]
Loading BokehJS ...

Interactive atlas#

Hide code cell source
plot_linked_scatters(df)

Patient Characteristics#

Foundation (unsupervised) model#

Hide code cell source
from tableone import TableOne
from datetime import date

columns = ['Hematopoietic Entity','Age (group years)','Sex',
            'Clinical Trial',]

mytable_cog = TableOne(df_train.reset_index(), columns,
                        overall=False, missing=True,
                        pval=False, pval_adjust=False,
                        htest_name=True,dip_test=True,
                        tukey_test=True, normal_test=True,

                        order={'FLT3 ITD':['Yes','No'],
                                'Age (group years)':['0-5','5-13','13-39','39-60'],
                                'MRD 1 Status': ['Positive'],
                                'Risk Group': ['High Risk', 'Standard Risk'],
                                'FLT3 ITD': ['Yes'],
                                'Leucocyte counts (10⁹/L)': ['≥30'],
                                'Age group (years)': ['≥10']})

mytable_cog.to_excel('../data/pt_characteristics_foundation_model_' + str(date.today()) +'.xlsx')

mytable_cog.tabulate(tablefmt="html", 
                        # headers=[score_name,"",'Missing','Discovery','Validation','p-value','Statistical Test']
                        )
Hide code cell output
Missing Overall
n 3308
Hematopoietic Entity, n (%)Acute lymphoblastic leukemia (ALL) 844 700 (28.4)
Acute myeloid leukemia (AML) 1207 (49.0)
Acute promyelocytic leukemia (APL) 31 (1.3)
Mixed phenotype acute leukemia (MPAL) 50 (2.0)
Myelodysplastic syndrome (MDS or MDS-like) 225 (9.1)
Otherwise-Normal (Control) 251 (10.2)
Age (group years), n (%) 0-5 1320 480 (24.1)
5-13 482 (24.2)
13-39 658 (33.1)
39-60 165 (8.3)
60+ 203 (10.2)
Sex, n (%) Female 1511 883 (49.1)
Male 914 (50.9)
Clinical Trial, n (%) AAML03P1 41 72 (2.2)
AAML0531 628 (19.2)
AAML1031 581 (17.8)
Beat AML Consortium 316 (9.7)
CCG2961 41 (1.3)
CETLAM SMD-09 (MDS-tAML) 166 (5.1)
French GRAALL 2003–2005 141 (4.3)
Japanese AML05 64 (2.0)
NOPHO ALL92-2000 933 (28.6)
TARGET ALL 131 (4.0)
TCGA AML 194 (5.9)

Fine-tuned (supervised) models#

Hide code cell source
columns = ['Age (years)','Age group (years)','Sex','Race or ethnic group',
            'Hispanic or Latino ethnic group', 'MRD 1 Status',
            'Leucocyte counts (10⁹/L)', 'BM leukemic blasts (%)',
            'Risk Group','FLT3 ITD', 'Clinical Trial']

df_test['Age (years)'] = df_test['Age (years)'].astype(float)

# join discovery clinical data with validation clinical data
all_cohorts = pd.concat([df_dx, df_px, df_test],
                         axis=0, keys=['AL Epigenomic Phenotype','AML Epigenomic Risk' ,'Validation'],
                         names=['cohort']).reset_index()

# columns = ['Age group (years)','Sex', 'MRD 1 Status',
#             'Leucocyte counts (10⁹/L)',
#             'Risk Group','FLT3 ITD', 'Treatment Arm','Clinical Trial']

mytable_cog = TableOne(all_cohorts, columns,
                        overall=False, missing=False,
                        pval=False, pval_adjust=False,
                        htest_name=True,dip_test=True,
                        tukey_test=True, normal_test=True,

                        order={'FLT3 ITD':['Yes','No'],
                                'Race or ethnic group':['White','Black or African American','Asian'],
                                'MRD 1 Status': ['Positive'],
                                'Risk Group': ['High Risk', 'Standard Risk'],
                                'FLT3 ITD': ['Yes'],
                                'Leucocyte counts (10⁹/L)': ['≥30'],
                                'Age group (years)': ['≥10']},
                                groupby='cohort')

mytable_cog.to_excel('../data/pt_characteristics_fine-tuned_models_' + str(date.today()) +'.xlsx')

mytable_cog.tabulate(tablefmt="html", 
                        # headers=[score_name,"",score_name,'Validation','p-value','Statistical Test']
)
Hide code cell output
AL Epigenomic Phenotype AML Epigenomic Risk Validation
n 2445 1844 201
Age (years), mean (SD) 19.3 (19.8) 19.5 (21.4) 8.8 (6.0)
Age group (years), n (%) ≥10 520 (47.2) 644 (48.2) 95 (47.7)
<10 581 (52.8) 693 (51.8) 104 (52.3)
Sex, n (%) Female 702 (50.4) 853 (49.2) 87 (43.3)
Male 691 (49.6) 879 (50.8) 114 (56.7)
Race or ethnic group, n (%) White 1052 (80.4) 1302 (80.4) 143 (71.9)
Black or African American 131 (10.0) 155 (9.6) 32 (16.1)
Asian 65 (5.0) 87 (5.4) 1 (0.5)
American Indian or Alaska Native 7 (0.5) 8 (0.5)
Native Hawaiian or other Pacific Islander7 (0.5) 10 (0.6) 2 (1.0)
Other 46 (3.5) 57 (3.5) 21 (10.6)
Hispanic or Latino ethnic group, n (%)Hispanic or Latino 204 (19.3) 245 (19.0) 25 (12.6)
Not Hispanic or Latino 851 (80.7) 1044 (81.0) 174 (87.4)
MRD 1 Status, n (%) Positive 282 (29.7) 361 (31.4) 76 (40.2)
Negative 667 (70.3) 787 (68.6) 113 (59.8)
Leucocyte counts (10⁹/L), n (%) ≥30 572 (52.4) 646 (48.9) 88 (44.0)
<30 520 (47.6) 676 (51.1) 112 (56.0)
BM leukemic blasts (%), mean (SD) 65.8 (24.1) 65.1 (24.2) 60.0 (25.6)
Risk Group, n (%) High Risk 195 (14.1) 299 (17.5) 51 (25.4)
Standard Risk 620 (44.9) 849 (49.7) 87 (43.3)
Low Risk 566 (41.0) 561 (32.8) 63 (31.3)
FLT3 ITD, n (%) Yes 179 (16.3) 248 (18.6) 31 (15.6)
No 920 (83.7) 1087 (81.4) 168 (84.4)
Clinical Trial, n (%) AAML03P1 62 (2.6) 72 (4.0)
AAML0531 510 (21.2) 628 (34.8)
AAML1031 489 (20.3) 581 (32.2)
Beat AML Consortium 192 (8.0) 225 (12.5)
CCG2961 31 (1.3) 41 (2.3)
CETLAM SMD-09 (MDS-tAML) 166 (6.9)
French GRAALL 2003–2005 141 (5.9)
Japanese AML05 9 (0.4) 15 (0.8)
NOPHO ALL92-2000 636 (26.5)
TARGET ALL 50 (2.1) 47 (2.6)
TCGA AML 118 (4.9) 194 (10.8)
AML02 159 (79.1)
AML08 42 (20.9)

By prognostic group#

Discovery#

Hide code cell source
def pt_characteristics_by_model(df, model_name, traintest = 'discovery'):
        columns = ['Age (years)','Age group (years)','Sex','Race or ethnic group',
                'Hispanic or Latino ethnic group', 'MRD 1 Status',
                'Leucocyte counts (10⁹/L)', 'BM leukemic blasts (%)',
                'Risk Group', 'Clinical Trial','FLT3 ITD', 'Treatment Arm']

        mytable_cog = TableOne(df, columns,
                                overall=False, missing=True,
                                pval=True, pval_adjust=False,
                                htest_name=True,dip_test=True,
                                tukey_test=True, normal_test=True,

                                order={'FLT3 ITD':['Yes','No'],
                                        'Race or ethnic group':['White','Black or African American','Asian'],
                                        'MRD 1 Status': ['Positive'],
                                        'Risk Group': ['High Risk', 'Standard Risk'],
                                        'FLT3 ITD': ['Yes'],
                                        'Leucocyte counts (10⁹/L)': ['≥30'],
                                        'Age group (years)': ['≥10']},
                                groupby=model_name)

        mytable_cog.to_excel('../data/pt_characteristics_'+ model_name +'_' + traintest + '_' + str(date.today()) + '.xlsx')

        return(mytable_cog.tabulate(tablefmt="html", 
                                headers=[model_name + ' ' + traintest,"",'Missing','High','Low','p-value','Statistical Test']))

pt_characteristics_by_model(df_px, model_name, 'discovery')
Hide code cell output
AML Epigenomic Risk discovery Missing High Low p-value Statistical Test
n 843 1001
Age (years), mean (SD) 65 22.6 (24.3)16.7 (18.2)<0.001 Two Sample T-test
Age group (years), n (%) ≥10 507 301 (50.6) 343 (46.2) 0.126 Chi-squared
<10 294 (49.4) 399 (53.8)
Sex, n (%) Female 112 384 (47.4) 469 (50.9) 0.165 Chi-squared
Male 426 (52.6) 453 (49.1)
Race or ethnic group, n (%) White 225 609 (79.7) 693 (81.1) 0.196 Chi-squared (warning: expected count < 5)
Black or African American 85 (11.1) 70 (8.2)
Asian 42 (5.5) 45 (5.3)
American Indian or Alaska Native 4 (0.5) 4 (0.5)
Native Hawaiian or other Pacific Islander 4 (0.5) 6 (0.7)
Other 20 (2.6) 37 (4.3)
Hispanic or Latino ethnic group, n (%)Hispanic or Latino 555 104 (18.2) 141 (19.6) 0.565 Chi-squared
Not Hispanic or Latino 467 (81.8) 577 (80.4)
MRD 1 Status, n (%) Positive 696 205 (41.2) 156 (24.0) <0.001 Chi-squared
Negative 293 (58.8) 494 (76.0)
Leucocyte counts (10⁹/L), n (%) ≥30 522 274 (46.7) 372 (50.6) 0.172 Chi-squared
<30 313 (53.3) 363 (49.4)
BM leukemic blasts (%), mean (SD) 236 66.8 (24.7)63.6 (23.7)0.007 Two Sample T-test
Risk Group, n (%) High Risk 135 210 (26.4) 89 (9.7) <0.001 Chi-squared
Standard Risk 502 (63.1) 347 (38.0)
Low Risk 84 (10.6) 477 (52.2)
Clinical Trial, n (%) AAML03P1 41 39 (4.6) 33 (3.4) <0.001 Chi-squared
AAML0531 268 (31.8) 360 (37.5)
AAML1031 252 (29.9) 329 (34.3)
Beat AML Consortium 117 (13.9) 108 (11.2)
CCG2961 28 (3.3) 13 (1.4)
Japanese AML05 8 (0.9) 7 (0.7)
TARGET ALL 17 (2.0) 30 (3.1)
TCGA AML 114 (13.5) 80 (8.3)
FLT3 ITD, n (%) Yes 509 130 (21.9) 118 (15.9) 0.007 Chi-squared
No 464 (78.1) 623 (84.1)
Treatment Arm, n (%) Arm A 1146 127 (41.5) 183 (46.7) 0.197 Chi-squared
Arm B 179 (58.5) 209 (53.3)

Validation#

Hide code cell source
pt_characteristics_by_model(df_test, model_name, 'validation')
Hide code cell output
AML Epigenomic Risk validation Missing High Low p-value Statistical Test
n 75 126
Age (years), mean (SD) 2 8.4 (6.2) 9.0 (5.9) 0.548 Two Sample T-test
Age group (years), n (%) ≥10 2 34 (45.9) 61 (48.8) 0.808 Chi-squared
<10 40 (54.1) 64 (51.2)
Sex, n (%) Female 0 32 (42.7) 55 (43.7) 1.000 Chi-squared
Male 43 (57.3) 71 (56.3)
Race or ethnic group, n (%) White 2 49 (67.1) 94 (74.6) 0.438 Chi-squared (warning: expected count < 5)
Black or African American 15 (20.5) 17 (13.5)
Asian 1 (1.4)
Native Hawaiian or other Pacific Islander 1 (1.4) 1 (0.8)
Other 7 (9.6) 14 (11.1)
Hispanic or Latino ethnic group, n (%)Hispanic or Latino 2 12 (16.2) 13 (10.4) 0.329 Chi-squared
Not Hispanic or Latino 62 (83.8) 112 (89.6)
MRD 1 Status, n (%) Positive 12 35 (50.0) 41 (34.5) 0.051 Chi-squared
Negative 35 (50.0) 78 (65.5)
Leucocyte counts (10⁹/L), n (%) ≥30 1 31 (41.9) 57 (45.2) 0.754 Chi-squared
<30 43 (58.1) 69 (54.8)
BM leukemic blasts (%), mean (SD) 21 61.7 (28.6)59.1 (23.9)0.545 Two Sample T-test
Risk Group, n (%) High Risk 0 29 (38.7) 22 (17.5) <0.001 Chi-squared
Standard Risk 37 (49.3) 50 (39.7)
Low Risk 9 (12.0) 54 (42.9)
Clinical Trial, n (%) AML02 0 58 (77.3) 101 (80.2) 0.766 Chi-squared
AML08 17 (22.7) 25 (19.8)
FLT3 ITD, n (%) Yes 2 15 (20.3) 16 (12.8) 0.229 Chi-squared
No 59 (79.7) 109 (87.2)
Treatment Arm, n (%) Arm A 2 45 (61.6) 62 (49.2) 0.122 Chi-squared
Arm B 28 (38.4) 64 (50.8)

Kaplan-Meier Plots#

Overall study population#

Hide code cell source
for dataset, trial in zip([df_cog, df_test], 
                          ['COG AML trials', 'Validation cohort']):
    draw_kaplan_meier(model_name=model_name,
                        df=dataset,
                        save_survival_table=False,
                        save_plot=False,
                        show_ci=False,
                        add_risk_counts=False,
                        trialname=trial,
                        figsize=(8,8))
Hide code cell output
../_images/ed5ddba7f661398be2d6b359fabee667d1b7b3db4abe144d419017fc92bfc581.png ../_images/bb0c065e12eeebd3cb2e4583658e4e4cf6f707aaf58e52456956fb2120a009ff.png

Per risk group#

Hide code cell source
for dataset, trial in zip([df_cog, df_test], ['COG AML trials', 'Validation cohort']):

    risk_groups = ['High Risk', 'Low Risk', 'Standard Risk']
    for risk_group in risk_groups:
        draw_kaplan_meier(
            model_name=model_name,
            df=dataset[dataset['Risk Group'] == risk_group],
            save_plot=False,
            save_survival_table=False,
            add_risk_counts=False,
            trialname=f'{trial} {risk_group}',
            figsize=(8, 8))
Hide code cell output
../_images/f17f40b1f7f53aa45a5ec5032edf4e186909a021e3b91cbb75aa24bafa50df39.png ../_images/25be55e4b3d426914d81dd9823d2ab52672f6988e464ff08bc136242e68cb2fc.png ../_images/128cb10cd77456a7586814781dd4442c4c8f4bdab2493d04bfa16e66cf900744.png ../_images/6a832fc8e80f37b59310d7a46e9798f70434cb92ceae0dd47cdd0c933a17bda5.png ../_images/4f1be392509b3bfd3277645d2c0c2142fd858ef8ca31ff1bcb6cc2ec519f02ac.png ../_images/4535f47ec1cf8d969e2f32bb76a982c6f9a8a0e2461a430e1a7f843fdd5143c8.png

Per risk group (AAML1831 COG)#

Hide code cell source
for dataset, trial in zip([df_cog],['COG AML trials']):

    risk_groups = ['High', 'Low', 'Standard']
    for risk_group in risk_groups:
        draw_kaplan_meier(
            model_name=model_name,
            df=dataset[dataset['Risk Group AAML1831'] == risk_group],
            save_plot=False,
            save_survival_table=False,
            add_risk_counts=False,
            trialname=f'{trial} {risk_group} Risk',
            figsize=(8, 8))
Hide code cell output
../_images/5631cbe9f929d00d6451e3c8aa48364140ec9e2e0eebfd4cb93d7129fcb397f5.png ../_images/50a6835305637968eea35af33772a681f2f2fd9ce9df8edfaec2a680489e8071.png ../_images/43693b6ee7e7c0e81bfb037a4aca0d4cf8903a4a80ba693812eb74e72b738b7f.png

Forest Plots#

With MRD 1#

Hide code cell source
for dataset, trial in zip([df_cog, df_test], ['COG AML trials', 'Validation cohort']):
    
    df_ = dataset.copy()
    df_['AML_Epigenomic_Risk'] = df_['AML Epigenomic Risk'] 

    draw_forest_plot(time='os.time',
                        event='os.evnt',
                        df=df_,
                        trialname=trial,
                        model_name='AML_Epigenomic_Risk',
                        save_plot=False)

    draw_forest_plot(time='efs.time',
                        event='efs.evnt',
                        df=df_,
                        trialname=trial,
                        model_name='AML_Epigenomic_Risk',
                        save_plot=False)
Hide code cell output
../_images/057fd568957865d299b86028b59793823213a200cf084f14177c0e6814e9c1aa.png ../_images/0c65d63a437a775f44e5fa96cb0b01234653f2cbb5e8d770feed93c39ccee451.png ../_images/c508d4f16a62730925d3dcb129748df68031bd14429188e1b8e4ef36a8654e45.png ../_images/ba94620f24a17c556b996e230d4e559cf37dc47778dda612fc730514c3d8ad73.png

With MRD 1 and BM blast (%)#

Hide code cell source
for dataset, trial in zip([df_cog, df_test], ['COG AML trials', 'Validation cohort']):
    
    df_ = dataset.copy()
    df_['BM leukemic blasts (%)'] = pd.cut(df_['BM leukemic blasts (%)'], bins=[0,50,100], labels=['≤50', '>50'])
    df_['AML_Epigenomic_Risk'] = df_['AML Epigenomic Risk'] 

    draw_forest_plot_withBMblast(time='os.time',
                        event='os.evnt',
                        df=df_,
                        trialname=trial,
                        model_name='AML_Epigenomic_Risk',
                        save_plot=False)

    draw_forest_plot_withBMblast(time='efs.time',
                        event='efs.evnt',
                        df=df_,
                        trialname=trial,
                        model_name='AML_Epigenomic_Risk',
                        save_plot=False)
Hide code cell output
../_images/c6df868c98613c77346319f6a96812e3726e01da055ef7697f43f8597f8db30b.png ../_images/78c607b5aa4301010c73b357c66a4876b9d28cc2d3e588e8e659f75b62033c8e.png ../_images/4e235dc7ddd6cb5cba9350d7f6b13fbd2aa35e3ace37ad92796e8aa164bc3620.png ../_images/0ba6c87841ce5e0b0bbac05d9d48389598d0afd127d23b456e7027883728b2f7.png

Without MRD 1#

Hide code cell source
for dataset, trial in zip([df_cog, df_test], ['COG AML trials', 'Validation cohort']):
    
    df_ = dataset.copy()
    df_['BM leukemic blasts (%)'] = pd.cut(df_['BM leukemic blasts (%)'], bins=[0,50,100], labels=['≤50', '>50'])
    df_['AML_Epigenomic_Risk'] = df_['AML Epigenomic Risk'] 

    draw_forest_plot_noMRD(time='os.time',
                        event='os.evnt',
                        df=df_,
                        trialname=trial,
                        model_name='AML_Epigenomic_Risk',
                        save_plot=False)

    draw_forest_plot_noMRD(time='efs.time',
                        event='efs.evnt',
                        df=df_,
                        trialname=trial,
                        model_name='AML_Epigenomic_Risk',
                        save_plot=False)
Hide code cell output
../_images/4764a3b59564a125f1bd7a38f7dc4bf5f48c6f337058f70c738578efe98b00f8.png ../_images/d5aba9b3eb3095ab3f91a7cb196ae8123e137e4625778eae1561cba1f0b60738.png ../_images/722b3da234e0ed3854bc8bc0b24d9e989e145e9bc76d65eea7d1a711f47cefb0.png ../_images/1eb1854e707d5ad0e812f8756c321153e78059e94dc8d81991a17964601b14eb.png

ROC AUC performance#

AL epigenomic phenotype#

Hide code cell source
df_dx_auc_train, df_dx_dummies_train = process_dataset_for_multiclass_auc(df_dx)
df_dx_auc_cog, df_dx_dummies_cog = process_dataset_for_multiclass_auc(df_cog)
df_dx_auc_test, df_dx_dummies_test = process_dataset_for_multiclass_auc(df_test)
                                                                        
p1 = plot_multiclass_roc_auc(df_dx_auc_train, df_dx_dummies_train.columns, title='Discovery cohort')
p2 = plot_multiclass_roc_auc(df_dx_auc_cog, df_dx_dummies_cog.columns, title='Discovery COG peds AML Dx')
p3 = plot_multiclass_roc_auc(df_dx_auc_test, df_dx_dummies_test.columns, title='Validation cohort')

# Create a gridplot
p = gridplot([
    [p1, p2, p3,],
    ], toolbar_location='above')

show(p)
Hide code cell output

AML epigenomic risk (probability) + risk group#

Hide code cell source
# Probability model
model_name = 'AML Epigenomic Risk P(High Risk)'
p1 = plot_roc_auc_with_riskgroup(df_px, 'os.evnt', model_name , title='Discovery cohort')
p2 = plot_roc_auc_with_riskgroup(df_cog, 'os.evnt', model_name, title='Discovery COG peds AML Dx')
p3 = plot_roc_auc_with_riskgroup(df_test, 'os.evnt', model_name, title='Validation cohort')


p4 = plot_roc_auc_with_riskgroup(df_px, 'os.evnt', model_name , sum_models=True)
p5 = plot_roc_auc_with_riskgroup(df_cog, 'os.evnt', model_name, sum_models=True)
p6 = plot_roc_auc_with_riskgroup(df_test, 'os.evnt', model_name, sum_models=True)

# Create a gridplot
p = gridplot([
    [p1, p2, p3,],
    [p4, p5, p6,],
    ], toolbar_location='above')

show(p)
Hide code cell output

Note

Sample size may be reduced in the ROC AUC because samples with missing risk group data were removed.

AML epigenomic risk (high-low) + risk group#

Hide code cell source
# Binary model
model_name = 'AML Epigenomic Risk'
p1 = plot_roc_auc_with_riskgroup(df_px, 'os.evnt', model_name , title='Discovery cohort')
p2 = plot_roc_auc_with_riskgroup(df_cog, 'os.evnt', model_name, title='Discovery COG peds AML Dx')
p3 = plot_roc_auc_with_riskgroup(df_test, 'os.evnt', model_name, title='Validation cohort')


p4 = plot_roc_auc_with_riskgroup(df_px, 'os.evnt', model_name , sum_models=True)
p5 = plot_roc_auc_with_riskgroup(df_cog, 'os.evnt', model_name, sum_models=True)
p6 = plot_roc_auc_with_riskgroup(df_test, 'os.evnt', model_name, sum_models=True)

# Create a gridplot
p = gridplot([
    [p1, p2, p3,],
    [p4, p5, p6,],
    ], toolbar_location='above')

show(p)
Hide code cell output

AML epigenomic risk + latest risk group (AAML1831 COG)#

Hide code cell source
# Probability model
model_name = 'AML Epigenomic Risk P(High Risk)'
p1 = plot_roc_auc_with_riskgroup(df_cog, 'os.evnt', model_name ,risk_group='Risk Group' ,title='Risk group AAML1031-0531')
p2 = plot_roc_auc_with_riskgroup(df_cog, 'os.evnt', model_name, risk_group='Risk Group AAML1831' ,title='Risk group AAML1831')
p3 = plot_roc_auc_with_riskgroup(df_cog, 'os.evnt', model_name, risk_group='Risk Group AAML1831', sum_models=True, title='Risk group AAML1831 + Epigenomic Risk')

# Binary model
model_name = 'AML Epigenomic Risk'
p4 = plot_roc_auc_with_riskgroup(df_cog, 'os.evnt', model_name ,risk_group='Risk Group')
p5 = plot_roc_auc_with_riskgroup(df_cog, 'os.evnt', model_name, risk_group='Risk Group AAML1831')
p6 = plot_roc_auc_with_riskgroup(df_cog, 'os.evnt', model_name, risk_group='Risk Group AAML1831', sum_models=True)

# Create a gridplot
p = gridplot([
    [p1, p2, p3,],
    [p4, p5, p6,],
    ], toolbar_location='above')

show(p)
Hide code cell output

Box Plots#

Hide code cell source
draw_boxplot(df=df_test,x='Risk Group', y='AML Epigenomic Risk P(High Risk)',
                order=['High Risk', 'Standard Risk', 'Low Risk'],
                trialname='StJude trials', hue=model_name,
                save_plot=False, figsize=(4,4))

draw_boxplot(df=df_test,x='MRD 1 Status', y='AML Epigenomic Risk P(High Risk)',
                order=['Positive','Negative'],
                trialname='StJude trials', hue=model_name,
                save_plot=False, figsize=(4,4))

draw_boxplot(df=df_test,x='Primary Cytogenetic Code', y='AML Epigenomic Risk P(High Risk)',
                order='auto',
                trialname='StJude trials', hue=model_name,
                save_plot=False, figsize=(4,4))
Hide code cell output
../_images/82c876901fd885f91745cd88f7981a9cd74c05cc0e56cff593f32255b5ca83d5.png ../_images/e6a6417a2d6d113b8b34d732fba36ac59c01f92b4cd3ac333ca852ed5fc59b1d.png ../_images/a21c365e90aa05081f24d8424ae83a0a8dd74e8ce377250fda61339b965fe422.png

Stacked Bar Plots#

Hide code cell source
model_name = 'AML Epigenomic Risk'
draw_stacked_barplot(df=df_test,x='MRD 1 Status', y=model_name,
             order=['Positive','Negative'],
             trialname='StJude trials', hue=model_name,
             save_plot=False, figsize=(4,3))

draw_stacked_barplot(df=df_test,x='Risk Group', y=model_name,
                order=['High Risk', 'Standard Risk', 'Low Risk'],
                trialname='StJude trials', hue=model_name,
                save_plot=False, figsize=(4,3), fontsize=9)

draw_stacked_barplot(df=df_test,x='Primary Cytogenetic Code', y=model_name,
                order='auto',
                trialname='StJude trials', hue=model_name,
                save_plot=False, figsize=(4,3), fontsize=6)
Hide code cell output
../_images/ddfb77eec9b84e5ad5c7f6c68bc81bc1ca91d2e9669d33924a86762b3c2b1699.png ../_images/2e5fa6c234ac57419c904c875448779596bc36c3d88a15f138df74f6b4765f53.png ../_images/067b858df84e0bfd469662614167f7b3504d3758b548e917aee40a01bb848558.png

Sankey plots#

Note

Sankey plots below compare the distribution of categories. The width of the lines is proportional to the number of patients in each group.

Samples with annotated diagnosis info#

Hide code cell source
colors = get_custom_color_palette()


draw_sankey_plot(df_train, 'WHO 2022 Diagnosis', 'AL Epigenomic Phenotype', colors,
                 title='Discovery cohort', fig_size=(4, 11),
                 fontsize=8, nan_action='drop')

draw_sankey_plot(df_cog, 'WHO 2022 Diagnosis', 'AL Epigenomic Phenotype', colors,
                 title= 'Discovery cohort (COG peds AML Dx samples only)',fig_size=(4, 10),
                 fontsize=8, nan_action='drop')

draw_sankey_plot(df_test, 'WHO 2022 Diagnosis', 'AL Epigenomic Phenotype', colors,
                 title= 'Validation cohort',fig_size=(3, 7),
                 fontsize=8, nan_action='drop')
Hide code cell output
../_images/861edc6eb7143867f352f0b5535e53be883be42c42cc559f9c7bdf22148f1f02.png ../_images/fbb5d8106305962a99a8a804022416a9c76b0cd63fffe6b0247c3f180f15d6cd.png ../_images/4054c01da4808defe6551ddad3d3bbae29ba564688f1423aeb441cd30ca93bff.png

Predictions in samples for which no WHO 22 Dx data was available#

Hide code cell source
draw_sankey_plot(df_train, 'WHO 2022 Diagnosis', 'AL Epigenomic Phenotype', colors,
                 title='Discovery cohort', fig_size=(4, 9),
                 fontsize=8, nan_action='keep only')

draw_sankey_plot(df_cog, 'WHO 2022 Diagnosis', 'AL Epigenomic Phenotype', colors,
                 title= 'Discovery cohort (COG peds AML Dx samples only)',fig_size=(4, 8),
                 fontsize=8, nan_action='keep only')

draw_sankey_plot(df_test, 'WHO 2022 Diagnosis', 'AL Epigenomic Phenotype', colors,
                 title= 'Validation cohort',fig_size=(4, 8),
                 fontsize=8, nan_action='keep only')
Hide code cell output
../_images/5a949811290d55cfc2414759329c735aae36a73580d521fd8c8bc2fa811b43a2.png ../_images/d7840be592c61aafb7814bdf63ec55c677eb5fa409ddf5aa74acd3ec4a4834d8.png ../_images/b3e712f0d8e246a767e3dd8334059764f26e86cc7d61de3aa56434d4a33a9fd8.png

Reason for unclassified samples#

Hide code cell source
draw_sankey_plot(df_train, 'WHO 2022 Diagnosis', 'Primary Cytogenetic Code', colors,
                 title='Discovery cohort', fig_size=(4, 6),
                 fontsize=8, nan_action='keep only')

draw_sankey_plot(df_cog, 'WHO 2022 Diagnosis', 'Gene Fusion', colors,
                 title= 'Discovery cohort (COG peds AML Dx samples only)',fig_size=(4, 9),
                 fontsize=8, nan_action='keep only')

draw_sankey_plot(df_test, 'WHO 2022 Diagnosis', 'Primary Cytogenetic Code', colors,
                 title= 'Validation cohort',fig_size=(2, 3),
                 fontsize=8, nan_action='keep only')
Hide code cell output
../_images/584ba3f4b41a1e45c1c7bb6a345f01ccb03d7074e703e429f061e3bb1cd71c4b.png ../_images/c01657bfa790f28d92f1762e7e9746027b933899c8cc0a47a4ec39c98997bf16.png ../_images/9ba84bfe4896f295efbab9b157fb4b0079d165caf9010d92affdcd654ee5a0bd.png

Risk group comparison in COG#

Hide code cell source
draw_sankey_plot(df_cog, 'Risk Group', 'Risk Group AAML1831', colors,
                 title= 'Discovery cohort (COG peds AML Dx samples only)',fig_size=(2, 4),
                 fontsize=8, nan_action='drop')

draw_sankey_plot(df_cog, 'Risk Group AAML1831', 'AML Epigenomic Risk', colors,
                 title= 'Discovery cohort (COG peds AML Dx samples only)',fig_size=(2, 4),
                 fontsize=8, nan_action='drop')
Hide code cell output
../_images/f83982441470711056fc53790dbdbb61af52b56f33d6c14a949bf7d6d755015a.png ../_images/55717b7e54fb62bf890061c4f75c360385c071d8ec78853b9c34ca1d38dcb5d8.png

Px and Dx model comparison#

Hide code cell source
draw_sankey_plot(df_train, 'AML Epigenomic Risk', 'AL Epigenomic Phenotype', colors,
                 title='Discovery cohort', fig_size=(3, 10),
                 fontsize=8, nan_action='drop')

draw_sankey_plot(df_cog, 'AML Epigenomic Risk', 'AL Epigenomic Phenotype', colors,
                 title= 'Discovery cohort (COG peds AML Dx samples only)',fig_size=(3, 10),
                 fontsize=8, nan_action='drop')

draw_sankey_plot(df_test, 'AML Epigenomic Risk', 'AL Epigenomic Phenotype', colors,
                 title= 'Validation cohort',fig_size=(3, 8),
                 fontsize=8, nan_action='drop')
Hide code cell output
../_images/26cff248bb91d404200786962957204a2a516e199fe1943e79c33d0eb8b4b28c.png ../_images/ae131f32f266640b90155338dcfafe28cb3f00c529c4bfc009800d2934a9cde6.png ../_images/386ec4f32613723db7e24fdf1a7d0592e7d614e1bb55ec0c4c1a3b25a3c7e3cd.png

Watermark#

Author: Francisco_Marchi@Lamba_Lab_UF

Python implementation: CPython
Python version       : 3.10.11
IPython version      : 8.20.0

pandas         : 2.2.0
seaborn        : 0.13.2
matplotlib     : 3.8.2
tableone       : 0.8.0
sklearn        : 1.4.0
lifelines      : 0.28.0
statannotations: not installed

Compiler    : GCC 11.3.0
OS          : Linux
Release     : 5.15.133.1-microsoft-standard-WSL2
Machine     : x86_64
Processor   : x86_64
CPU cores   : 6
Architecture: 64bit